Multiple imputation with compatibility for high-dimensional data

نویسندگان

چکیده

Multiple Imputation (MI) is always challenging in high dimensional settings. The imputation model with some selected number of predictors can be incompatible the analysis leading to inconsistent and biased estimates. Although compatibility such cases may not achieved, but one obtain consistent unbiased estimates using a semi-compatible model. We propose relax lasso penalty for selecting large set variables (at most n). substantive that also uses formal variable selection procedure high-dimensional structures then expected nested this resulting will probability. likelihood unstable face convergence issues as becomes nearly sample size. To address these issues, we further use ridge obtaining posterior distribution parameters based on observed data. proposed technique compared standard MI software techniques available data simulation studies real life dataset. Our results exhibit superiority approach existing approaches while addressing issue.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiple imputation and analysis for high‐dimensional incomplete proteomics data

Multivariable analysis of proteomics data using standard statistical models is hindered by the presence of incomplete data. We faced this issue in a nested case-control study of 135 incident cases of myocardial infarction and 135 pair-matched controls from the Framingham Heart Study Offspring cohort. Plasma protein markers (K = 861) were measured on the case-control pairs (N = 135), and the maj...

متن کامل

Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data

Multiple imputation (MI) has been widely used for handling missing data in biomedical research. In the presence of high-dimensional data, regularized regression has been used as a natural strategy for building imputation models, but limited research has been conducted for handling general missing data patterns where multiple variables have missing values. Using the idea of multiple imputation b...

متن کامل

Multiple imputation and random forests (MIRF) for unobservable, high-dimensional data.

Understanding the genetic underpinnings to complex diseases requires consideration of sophisticated analytical methods designed to uncover intricate associations across multiple predictor variables. At the same time, knowledge of whether single nucleotide polymorphisms within a gene are on the same (in cis) or on different (in trans) chromosomal copies, may provide crucial information about mea...

متن کامل

Multiple Imputation for Missing Data

Multiple imputation provides a useful strategy for dealing with data sets with missing values. Instead of filling in a single value for each missing value, Rubin’s (1987) multiple imputation procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. These multiply imputed data sets are then analyzed by using standard proc...

متن کامل

Methods for regression analysis in high-dimensional data

By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: PLOS ONE

سال: 2021

ISSN: ['1932-6203']

DOI: https://doi.org/10.1371/journal.pone.0254112